17 research outputs found

    Polynomial Time Approximation Schemes for Clustering in Low Highway Dimension Graphs

    Get PDF
    We study clustering problems such as k-Median, k-Means, and Facility Location in graphs of low highway dimension, which is a graph parameter modeling transportation networks. It was previously shown that approximation schemes for these problems exist, which either run in quasi-polynomial time (assuming constant highway dimension) [Feldmann et al. SICOMP 2018] or run in FPT time (parameterized by the number of clusters k, the highway dimension, and the approximation factor) [Becker et al. ESA 2018, Braverman et al. 2020]. In this paper we show that a polynomial-time approximation scheme (PTAS) exists (assuming constant highway dimension). We also show that the considered problems are NP-hard on graphs of highway dimension 1

    Experimental Evaluation of Fully Dynamic k-Means via Coresets

    Full text link
    For a set of points in Rd\mathbb{R}^d, the Euclidean kk-means problems consists of finding kk centers such that the sum of distances squared from each data point to its closest center is minimized. Coresets are one the main tools developed recently to solve this problem in a big data context. They allow to compress the initial dataset while preserving its structure: running any algorithm on the coreset provides a guarantee almost equivalent to running it on the full data. In this work, we study coresets in a fully-dynamic setting: points are added and deleted with the goal to efficiently maintain a coreset with which a k-means solution can be computed. Based on an algorithm from Henzinger and Kale [ESA'20], we present an efficient and practical implementation of a fully dynamic coreset algorithm, that improves the running time by up to a factor of 20 compared to our non-optimized implementation of the algorithm by Henzinger and Kale, without sacrificing more than 7% on the quality of the k-means solution.Comment: Accepted at ALENEX 2

    A Quasi-Polynomial-Time Approximation Scheme for Vehicle Routing on Planar and Bounded-Genus Graphs

    Get PDF
    The Capacitated Vehicle Routing problem is a generalization of the Traveling Salesman problem in which a set of clients must be visited by a collection of capacitated tours. Each tour can visit at most Q clients and must start and end at a specified depot. We present the first approximation scheme for Capacitated Vehicle Routing for non-Euclidean metrics. Specifically we give a quasi-polynomial-time approximation scheme for Capacitated Vehicle Routing with fixed capacities on planar graphs. We also show how this result can be extended to bounded-genus graphs and polylogarithmic capacities, as well as to variations of the problem that include multiple depots and charging penalties for unvisited clients

    Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation

    Full text link
    In all state-of-the-art sketching and coreset techniques for clustering, as well as in the best known fixed-parameter tractable approximation algorithms, randomness plays a key role. For the classic kk-median and kk-means problems, there are no known deterministic dimensionality reduction procedure or coreset construction that avoid an exponential dependency on the input dimension dd, the precision parameter ε1\varepsilon^{-1} or kk. Furthermore, there is no coreset construction that succeeds with probability 11/n1-1/n and whose size does not depend on the number of input points, nn. This has led researchers in the area to ask what is the power of randomness for clustering sketches [Feldman, WIREs Data Mining Knowl. Discov'20]. Similarly, the best approximation ratio achievable deterministically without a complexity exponential in the dimension are Ω(1)\Omega(1) for both kk-median and kk-means, even when allowing a complexity FPT in the number of clusters kk. This stands in sharp contrast with the (1+ε)(1+\varepsilon)-approximation achievable in that case, when allowing randomization. In this paper, we provide deterministic sketches constructions for clustering, whose size bounds are close to the best-known randomized ones. We also construct a deterministic algorithm for computing (1+ε)(1+\varepsilon)-approximation to kk-median and kk-means in high dimensional Euclidean spaces in time 2k2/εO(1)poly(nd)2^{k^2/\varepsilon^{O(1)}} poly(nd), close to the best randomized complexity. Furthermore, our new insights on sketches also yield a randomized coreset construction that uses uniform sampling, that immediately improves over the recent results of [Braverman et al. FOCS '22] by a factor kk.Comment: FOCS 2023. Abstract reduced for arxiv requirement

    Near-linear time approximations schemes for clustering in doubling metrics

    Get PDF
    International audienceWe consider the classic Facility Location, k-Median, and k-Means problems in metric spaces of constant doubling dimension. We give the first nearly linear-time approximation schemes for each problem, making a significant improvement over the state-of-the-art algorithms. Moreover, we show how to extend the techniques used to get the first efficient approximation schemes for the problems of prize-collecting k-Medians and k-Means, and efficient bicriteria approximation schemes for k-Medians with outliers, k-Means with outliers and k-Center

    A New Coreset Framework for Clustering

    Full text link
    Given a metric space, the (k,z)(k,z)-clustering problem consists of finding kk centers such that the sum of the of distances raised to the power zz of every point to its closest center is minimized. This encapsulates the famous kk-median (z=1z=1) and kk-means (z=2z=2) clustering problems. Designing small-space sketches of the data that approximately preserves the cost of the solutions, also known as \emph{coresets}, has been an important research direction over the last 15 years. In this paper, we present a new, simple coreset framework that simultaneously improves upon the best known bounds for a large variety of settings, ranging from Euclidean space, doubling metric, minor-free metric, and the general metric cases

    Differential Privacy for Clustering Under Continual Observation

    Full text link
    We consider the problem of clustering privately a dataset in Rd\mathbb{R}^d that undergoes both insertion and deletion of points. Specifically, we give an ε\varepsilon-differentially private clustering mechanism for the kk-means objective under continual observation. This is the first approximation algorithm for that problem with an additive error that depends only logarithmically in the number TT of updates. The multiplicative error is almost the same as non privately. To do so we show how to perform dimension reduction under continual observation and combine it with a differentially private greedy approximation algorithm for kk-means. We also partially extend our results to the kk-median problem

    Fully Dynamic Consistent Facility Location

    Get PDF
    We consider classic clustering problems in fully dynamic data streams, where data elements can be both inserted and deleted. In this context, several parameters are of importance: (1) the quality of the solution after each insertion or deletion, (2) the time it takes to update the solution, and (3) how different consecutive solutions are. The question of obtaining efficient algorithms in this context for facility location, k-median and k-means has been raised in a recent paper by Hubert-Chan et al. [WWW'18] and also appears as a natural follow-up on the online model with recourse studied by Lattanzi and Vassilvitskii [ICML'17] (i.e.: in insertion-only streams). In this paper, we focus on general metric spaces and mainly on the facility location problem. We give an arguably simple algorithm that maintains a constant factor approximation, with O(n log n) update time, and total recourse O(n). This improves over the naive algorithm which consists in recomputing a solution at each time step and that can take up to O(n^2) update time, and O(n^2) total recourse. These bounds are nearly optimal: in general metric space, inserting a point take O(n) times to describe the distances to other points, and we give a simple lower bound of O(n) for the recourse. Moreover, we generalize this result for the k-medians and k-means problems: our algorithm maintains a constant factor approximation in time O˜(n+k^2). We complement our analysis with experiments showing that the cost of the solution maintained by our algorithm at any time t is very close to the cost of a solution obtained by quickly recomputing a solution from scratch at time t while having a much better running time
    corecore